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Abstract 

Dependable  mobile  computing  is  enhanced  by  indepen¬ 
dent  recovery ;  low  power  consumption  and  no  dependence 
on  stable  storage  at  the  mobile  host.  Existing  recovery  pro¬ 
tocols  proposed  for  mobile  environments  typically  create 
consistent  global  checkpoints  that  do  not  guarantee  inde¬ 
pendent  recovery' and  low  power  consumption.  This  paper 
demonstrates  the  advantages  of  message  logging  by  de¬ 
scribing  a  receiver  based  logging  protocol.  Checkpointing 
is  utilized  to  limit  log  size  and  recovery  latency.  We  com¬ 
pare  the  performance  of  o Ur  approach  with  that  of  existing 
mobile  checkpointing  and  recovery  algorithms  in  terms  of 
failure  free  overhead  and  recovery  time.  We  also  describe  a 
stable  storage  management  scheme  for  mobile  support  sta¬ 
tions.  Garbage  collection  is  achieved  without  direct  par¬ 
ticipation  of  mobile  hosts. 

1  Introduction 

Mobile  computing  [1,2]  presents  new  challenges  and 
requirements  for  checkpointing  and  recovery  protocols  [3]. 
Failures  such  as  loss  of  connection  or  power  outages  that 
are  rare  in  fixed  networks  can  be  common  in  mobile  en¬ 
vironments.  Recovery  algorithms  are  required  to  toierate 
multiple  simultaneous  failures  and  failure  during  recovery, 
and  it  is  desirable  that  processes  be  able  to  recover  inde¬ 
pendently.  Coordinated  recovery  among  processes  running 
on  Mobile  Hosts  (MH)  may  slow  down  recovery  [4]  and 
increase  the  chance  of  having  multiple  rollbacks  of  the  en¬ 
tire  system  in  order  to  handle  errors  during  recovery.  Con¬ 
serving  battery  power  by  means  of  limiting  the  number  of 
extra  messages  during  checkpointing  and  recovery  is  also 
important.  Limiting  additional  transmitted  messages  has 
the  added  benefit  of  reducing  contention  on  the  wireless 
network. 

As  an  MH  may  be  lost  or  permanently  damaged,  hard 
drives  on  mobile  hosts  are  not  generally  considered  stable 
storage.  Therefore  they  are  not  suitable  as  the  only  loca¬ 
tion  for  storing  checkpoints  or  message  logs.  Traditional 
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checkpointing  and  message  logging  algorithms  [5-12]  are 
not  directly  applicable  under  such  conditions.  Previous 
proposals  have  suggested  that  checkpoints  be  sent  back  to 
Home  Agents  (HA)  [13].  Others  have  proposed  that  sta¬ 
ble  storage  on  Mobile  Support  Stations  (MSS)  be  used  for 
checkpoints  and  message  logs  [14-16],  Because  check¬ 
points  and/or  message  logs  are  stored  on  different  MSSs 
as  an  MH  moves  from  cell  to  cell,  the  organization  of  the 
distributed  process  state  information  is  important  for  suc¬ 
cessful  recovery.  Several  algorithms  have  been  proposed  to 
solve  these  problems  [14, 17].  Garbage  collection  of  stable 
storage  on  MSS  is  also  of  importance.  When  state  infor¬ 
mation  on  MSS  is  no  longer  needed  for  recovery,  it  should 
be  discarded  to  make  room  for  new  checkpoints  and  mes¬ 
sage  logs.  Most  present  schemes  require  cooperation  from 
mobile  hosts.  If  one  participating  MH  fails,  some  stable 
storage  may  never  be  collected  for  reuse. 

Checkpointing  and  recovery  protocols  previously  pro¬ 
posed  for  mobile  environments  have  typically  saved  con¬ 
sistent  global  checkpoints  [  1 3, 15, 18, 19].  This  requires  all 
participating  mobile  hosts  to  roll  back  during  recovery,  but 
some  mobile  hosts  may  net  be  able  to  rollback  due  to  tran¬ 
sient  or  permanent  failures.  This  approach  forces  applica¬ 
tion  messages  to  be  resent  over  the  slow  wireless  network 
during  recovery,  resulting  in  slow  recovery  and  additional 
power  usage  at  the  MH.  Recent  work  has  shown  through 
analytic  modeling  that  message  logging  can  be  an  attractive 
approach  to  recovery  in  mobile  environments  [14].  An¬ 
other  recent  research  project  has  derived  mechanisms  for 
managing  stable  storage  on  the  MSS  [20]. 

This  paper  describes  a  receiver-based  pessimistic  mes¬ 
sage  logging  protocol  for  MH,  MSS  and  HA,  and  a  dis¬ 
tributed  state  organization  scheme  for  mobile  computing 
environments.  Using  our  approach,  processes  running  on 
mobile  hosts  are  able  to  recover  quickly  and  indepen¬ 
dently  of  other  processes.  The  protocol  is  experimentally 
compared  with  an  ideal  consistent  checkpoint  protocol  to 
demonstrate  that  message  logging  incurs  similar  failure 
free  overhead  and  achieves  much  faster  recovery  in  a  wire¬ 
less  network  implementation.  This  approach  requires  no 
extra  control  messages  sent  by  the  MH.  Garbage  collection 
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is  achieved  without  direct  participation  of  the  MH.  Even  if 
the  MH  permanently  fails,  state  information  left  on  MSSs 
can  be  identified  and  discarded. 

2  The  Mobile  IP  Environment 

The  mobile  computing  environment  used  in  this  paper  is 
based  on  the  Mobile  IP  architecture  [2].  This  environment, 
as  illustrated  in  Figure  1,  contains  fixed  hosts  connected 
by  a  backbone  network  and  mobile  hosts  that  use  a  wire¬ 
less  interface  to  communicate  with  fixed  hosts  and  other 
mobile  hosts.  Each  MH  is  associated  with  a  home  net¬ 
work  on  which  the  MH  receives  packets  like  a  normal  fixed 
host.  Each  MH  is  also  assigned  a  home  address  that  has 
the  subnet  prefix  of  its  home  network.  The  home  address 
never  changes,  regardless  of  the  MH's  movement.  Mobile 
support  stations  (foreign  agents)  are  those  fixed  hosts  that 
have  both  a  wireless  interface  and  a  fixed  network  (Eth¬ 
ernet,  ATM,  etc.)  interface.  They  function  as  routers  and 
provide  connections  for  mobile  hosts  to  the  entire  network. 
The  area  that  a  mobile  support  station's  wireless  interface 
serves  is  called  a  cell.  As  mobile  hosts  move  from  cell  to 
cell,  their  IP  addresses  have  to  be  changed  to  reflect  the 
subnet  mask  of  their  new  mobile  support  stations.  This  can 
cause  difficulty  in  maintaining  a  connection  as  the  MH  tra¬ 
verses  cells.  Mobile  IP  solves  this  problem  by  providing 
both  home  agents  and  foreign  agents. 

When  communicating  with  a  mobile  host,  other  hosts 
always  send  packets  to  the  mobile  host's  home  address.  A 
home  agent  executes  on  the  mobile  host’ s  home  network.  It 
maintains  current  location  information  of  the  mobile  hosts. 
Packets  destined  for  mobile  hosts  are  intercepted  by  the 
home  agent  and  then  tunneled  to  the  current  foreign  agent 
that  is  serving  the  mobile  host.  Packets  are  then  deliv¬ 
ered  to  the  mobile  host  by  the  foreign  agent.  Packets  sent 
by  mobile  hosts  are  generally  delivered  to  their  destina¬ 
tion  using  standard  IP  routing  mechanisms,  not  necessarily 
through  the  home  agent.  An  example  is  illustrated  in  Fig¬ 
ure  I.  Packets  from  mobile  host  A  follow  the  dotted  line 
and  pass  an  FA,  a  HA,  and  an  FA,  before  reaching  mobile 
host  B. 

One  of  the  distinctive  features  of  this  architecture  is  the 
existence  of  home  agents.  When  an  MH  switches  cells,  the 
home  agent  must  know  where  the  mobile  host  is  located  be¬ 
fore  any  future  packets  can  be  delivered  to  the  mobile  host. 
This  suggests  that  the  home  agent  might  be  an  attractive 
place  to  log  messages  for  mobile  hosts.  However  there  are 
routing  optimizations  proposed  in  the  literature  [21]  that 
route  some  packets  directly  to  the  foreign  agent  of  the  mo¬ 
bile  host  instead  of  through  its  home  agent.  On  the  other 
hand,  we  observe  that  all  packets  sent  to  mobile  hosts  must 
reach  the  mobile  support  station  first  before  transmission 
through  the  wireless  interface.  In  our  approach  stable  stor¬ 
age  at  the  MSS  is  used  to  store  checkpoints  and  message 


Figure  1:  The  mobile  IP  environment, 
logs  for  mobile  hosts. 

We  assume  that  the  foreign  agents  and  home  agents  do 
not  fail  when  serving  mobile  hosts  that  are  executing  the 
checkpoint  and  roll-back  recovery  protocol.  MSS  and  HA 
typically  have  a  much  smaller  failure  rate  than  that  of  MHs 
as  they  run  on  fixed  networks.  Even  if  they  do  fail,  since 
HA  and  FA  are  processes  that  implement  carefully  defined 
state  machines,  checkpointing  and  message  logging  pro¬ 
tocols  can  be  designed  relatively  easily  to  tolerate  those 
failures  [22]. 

There  can  be  multiple  processes  running  on  a  single  mo¬ 
bile  host.  They  can  have  fail-stop  failures  independent  of 
each  other,  or  fail  at  the  same  time  as  the  mobile  host.  Fi¬ 
nally,  we  assume  that  MHs  communicate  with  MSSs  using 
a  FIFO  link  level  protocol,  processes  communicate  with 
each  other  using  TCP  (or  other  reliable  transport  proto¬ 
col)  over  Mobile  IP,  and  processes  execute  according  to 
the  piece  wise  deterministic  model. 

3  Related  Work 

Elnozahy,  Johnson,  and  Wang  developed  a  general  sur¬ 
vey  for  checkpointing  and  message  logging  protocols  in 
distributed  systems  [23].  Alvisi  and  Marzullo  have  pro¬ 
vided  an  in-depth  treatment  of  message  logging  [24].  Rao 
and  Alvisi  compared  the  cost  of  recovery  for  different  mes¬ 
sage  logging  approaches  [4],  and  Neves  and  Fuchs  [25] 
compared  recovery  speed  for  a  coordinated  checkpoint 
protocol  [13]  and  a  sender  based  message  logging  proto¬ 
col  [8]. 

Acharya  and  Badrinath  [15]  introduced  a  two-phase 
method  for  taking  global  consistent  checkpoints.  They  pro¬ 
posed  that  checkpoints  be  stored  on  the  stable  storage  of 
mobile  support  stations  instead  of  on  mobile  hosts.  In  their 
protocol,  processes  alternate  between  two  states,  SEND 
and  RECV.  If  a  process  is  in  the  SEND  mode  and  receives  a 


Figure  2:  Message  logging  algorithm. 


message,  it  is  forced  to  take  a  checkpoint.  During  recovery, 
the  global  state  is  reconstructed  from  a  set  of  checkpoints 
for  each  process. 

Pradhan,  et  al.  analytically  evaluated  the  performance 
of  different  state  saving  protocols  and  hand  off  strate¬ 
gies  [14].  They  also  suggested  storing  checkpoints  and 
message  logs  at  mobile  support  stations.  Their  result  indi¬ 
cates  that  message  logging  is  suitable  for  mobile  environ¬ 
ments  except  in  cases  where  the  mobile  host  has  high  mo¬ 
bility,  wireless  bandwidth  is  low,  and  failure  rate  is  high. 

A  hybrid  checkpoint-recovery  protocol  for  mobile  sys¬ 
tems  was  proposed  by  Higake  and  Takizawa  [17].  As  a 
mobile  host  moves  between  cells,  it  leaves  an  agent  pro¬ 
cess  on  each  mobile  support  station  on  its  itinerary.  Dur¬ 
ing  recovery,  processes  on  fixed  hosts  recover  from  consis¬ 
tent  checkpoints  and  processes  on  mobile  hosts  restart  from 
their  own  checkpoints  and  roll  to  a  state  that  is  consistent 
with  those  on  fixed  hosts  with  the  help  of  agent  processes. 

Cao  and  Singhal  proved  that  it  is  not  possible  for  a 
checkpoint  algorithm  to  preserve  both  non  blocking  and 
min  process  properties  at  the  same  time  [18].  This  work  is 
based  on  an  earlier  work  that  tried  to  achieve  non  blocking 
and  min  processes  at  the  same  time  [16].  They  also  pro¬ 
posed  a  non-blocking  mobile  checkpointing  protocol  [19]. 
Their  scheme  uses  mutable  checkpoints  to  avoid  storing 
unnecessary  checkpoints  on  mobile  support  stations. 

Neves  and  Fuchs  [13]  developed  an  adaptive  check¬ 
pointing  scheme  for  mobile  computing.  Their  protocol 
uses  time  to  indirectly  coordinate  the  creation  of  recover¬ 
able  consistent  checkpoints.  Processes  take  hard  check¬ 
points  that  are  sent  to  home  agents  and  soft  checkpoints 
that  are  stored  on  the  local  disk  of  the  mobile  host. 


4  Algorithm  Description 

4.1  Message  Logging  and  Checkpointing 

The  message  logging  and  checkpointing  algorithm  of 
this  paper  consists  of  three  parts:  one  that  executes  on  the 
MSS,  one  that  executes  at  the  HA,  and  one  implemented 
by  application  processes  on  the  mobile  host.  The  protocol 
implemented  by  application  processes  on  the  mobile  host 
is  similar  to  existing  message  logging  protocols.  Processes 
tag  every  sent  message  and  store  a  copy  of  the  tag  in  mem¬ 
ory.  Upon  receiving  a  message,  a  process  also  stores  the 
tag  of  the  received  message  in  memory.  The  tag  includes 
the  message  sequence  number,  a  globally  unique  process 
identifier,  and  the  tag  of  the  last  message  received  by  the 
sending  process.  The  process  periodically  writes  check¬ 
points  to  the  MSS  that  is  currently  serving  as  its  foreign 
agent.  Checkpoints  are  taken  in  a  non-blocking  fashion  us¬ 
ing  copy  on  write  [26].  The  checkpoint  includes  not  only 
the  process  state  information  necessary  to  recover  the  pro¬ 
cess,  but  also  the  tags  of  the  last  message  it  sent  and  re¬ 
ceived  before  the  checkpoint. 

Mobile  IP  maintains  the  TCP  connection  as  the  MH 
moves,  thus  switching  cells  when  writing  checkpoints  does 
not  create  problems.  Packets  sent  by  the  MH  are  routed 
as  conventional  IP,  instead  of  having  to  go  through  the 
HA,  thus  not  degrading  performance  on  the  MSS.  When 
the  MH  switches  cells,  application  processes  are  notified. 
Each  process  sends  to  the  new  MSS  a  Report  Message  that 
contains  the  tag  of  the  last  message  it  received  from  the  old 
MSS.  These  messages  can  be  piggybacked  in  the  registra¬ 
tion  packet  specified  by  the  Mobile  IP  protocol. 

Every  time  an  MH  enters  a  MSS'  s  cell,  the  MSS  assigns 
a  unique  id  for  the  message  log  of  each  application  pro¬ 
cess  executing  the  message  logging  protocol  on  the  MH.  If 
the  MH  has  visited  this  MSS  before,  the  ids  assigned  are 
greater  than  those  previous.  Messages  destined  for  mobile 
hosts  in  the  MSS’ s  cell  are  logged  on  the  MSS  before  being 
forwarded.  The  sequence  of  messages  seen  by  the  mobile 
host  are  the  same  as  seen  and  logged  by  the  MSS  due  to 
the  FIFO  link-level  wireless  protocol.  Some  messages  can 
be  logged  by  the  MSS  and  not  yet  delivered  to  the  MH. 
This  type  of  message  is  an  in  transit  message.  The  new 
MSS  forwards  report  messages  sent  to  it  from  the  MH  at 
cell  switch  time  to  the  old  MSS  of  the  MH.  The  old  MSS 
can  then  use  the  last  received  message  tag  for  each  appli¬ 
cation  process  to  detect  in  transit  messages  and  can  purge 
them  from  message  logs.  After  each  mobile  host  leaves  a 
cell,  the  MSS  that  the  MH  just  left  sends  a  message  to  the 
MH's  HA  reporting  the  messages  log  ids  for  each  applica¬ 
tion  process  on  this  MH.  If  an  MH  writes  a  checkpoint  to 
an  MSS,  the  MSS  sends  a  checkpoint  message  to  the  MH's 
HA  indicating  that  it  has  finished  storing  the  checkpoint  for 
the  MH.  In  the  event  that  a  process  failed  on  an  MH,  or  the 
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MH  failed,  the  MSS  that  was  serving  the  MH  detects  the 
error  and  sends  its  home  agent  a  Failed  Message  that  con¬ 
tains  message  tags  of  the  last  sent  message  for  each  failed 
application  to  the  MH's  HA. 

On  the  HA,  a  process  keeps  track  of  the  location  of  an 
MH  using  the  registration  procedure  specified  by  the  Mo¬ 
bile  IP  protocol.  It  saves  the  whole  itinerary  taken  by  the 
mobile  host  into  an  array  named  itinerary  array .  Each  el¬ 
ement  of  the  array  is  the  address  of  an  MSS  on  the  MH’s 
path.  The  HA  waits  for  the  checkpoint  message  from  the 
MSS  that  contains  the  location  of  the  last  finished  check¬ 
point  of  the  mobile  host.  Upon  receiving  this  message,  the 
HA  may  begin  the  garbage  collection  procedure  to  reclaim 
stable  storage  on  the  MSSs. 

The  HA  also  serves  as  an  MSS  on  the  home  network. 

It  executes  the  protocol  for  MSSs  in  addition  to  HA  pro¬ 
tocols.  If  an  MH  does  not  use  the  wireless  interface  on  its 
home  network,  its  HA  has  to  deviate  from  standard  Mobile 
IP  so  that  the  HA  can  intercept  and  log  messages  destined 
for  MHs  that  are  “at  home”. 

In  our  protocol,  checkpoints  are  taken  periodically  and 
stored  on  MSSs.  As  one  reviewer  pointed  out,  using  two 
kinds  of  checkpoints  [13],  one  stored  on  the  MH  host's 
local  disk  and  one  on  the  MSSs  can  reduce  power  con¬ 
sumption  of  the  mobile  host  even  further  (as  a  smaller  num¬ 
ber  of  checkpoints  are  sent  through  the  wireless  interface) 
and  achieve  even  faster  recovery  if  the  local  disk  survives 
failure.  The  HA  must  be  aware  of  these  “possibly  stable” 
checkpoints  so  that  it  can  know  which  message  logs  should 
be  sent  to  the  MH.  The  associated  cost  is  that  larger  mes¬ 
sage  logs  have  to  be  stored  at  MSSs  and  extra  processing 
performed  at  HAs.  Also,  if  the  local  disk  on  the  MH  fails, 
recovery  can  take  longer.  Depending  on  the  application, 
network  characteristics  and  type  of  mobile  hosts,  this  can 
serve  as  a  viable  alternative  to  saving  all  checkpoints  on 
MSSs. 

If  processes  running  on  mobile  hosts  maintain  multi¬ 
ple  TCP  connections  with  other  processes,  they  can  see 
a  different  sequence  of  messages  from  those  by  the  for¬ 
eign  agents  due  to  scheduling  in  the  thread  library.  During 
recovery  they  could  take  an  execution  path  different  from 
that  before  the  failure.  To  ensure  correctness,  we  limit  ap¬ 
plications  running  on  the  mobile  hosts  to  be  a  single  mes¬ 
sage  queue  shared  by  multiple  processes,  or  a  single  pro¬ 
cess  with  one  TCP  connection. 

Figure  2  illustrates  the  effects  of  the  logging  algorithm. 
As  a  mobile  host  moves  along  the  dashed  line,  MSSs  on  its 
itinerary  log  messages  for  the  mobile  host.  The  mobile  host 
takes  a  checkpoint  while  at  MSS2.  All  this  information  is 
sent  to  the  HA  for  further  processing. 


Figure  3:  Garbage  collection. 

4.2  Distributed  State  Information  and  Garbage 
Collection 

As  the  MH  moves  from  cell  to  cell,  it  leaves  message 
logs  and  checkpoints  on  the  MSSs  that  are  on  its  itinerary. 
Itinerary  information  kept  at  its  HA  is  used  to  reconstruct 
a  consistent  state  for  the  MH.  The  HA  knows  on  which 
MSS  the  last  checkpoints  are  stored  for  each  application 
process  on  the  MH  by  examining  the  checkpoint  messages 
sent  to  it  by  MSSs.  Messages  logged  after  the  checkpoint 
will  have  to  be  replayed  in  order  for  processes  on  the  MH  to 
recover.  All  other  logged  messages  and  previously  stored 
checkpoints  are  no  longer  needed  for  recovery. 

An  example  is  sketched  in  Figure  3.  Assume  there  is 
only  one  application  process  P  executing  on  the  MH.  The 
MH  moves  from  MSS1  to  MSS3  and  P  takes  a  checkpoint 
on  MSS2.  HA  has  the  MH’s  itinerary  and  the  location  of 
P's  checkpoint  stored  in  its  memory.  Since  the  HA  knows 
that  a  checkpoint  for  P  has  been  taken  at  MSS2,  the  HA  can 
now  ask  MSS1  to  delete  message  logs  for  process  P,  and 
ask  MSS2  to  delete  messages  logged  prior  to  the  check¬ 
point  since  they  are  no  longer  necessary  for  recovery. 

Garbage  collection  is  straight  forward.  After  receiving 
a  checkpoint  message  from  an  MSS,  the  HA  examines  its 
itinerary  array  and  determines  which  message  log  is  useful 
and  which  is  rendered  obsolete  by  the  checkpoint.  The  HA 
sends  out  requests  to  those  MSS  with  stale  message  logs 
and  checkpoints  so  that  they  can  garbage  collect  the  stable 
storage.  This  request  includes  message  log  identifiers  so 
that  the  MSS  can  distinguish  among  multiple  logs  in  case 
the  MH  visits  the  MSS  several  times.  Upon  receiving  the 
request,  the  MSS  removes  obsolete  message  logs  and/or 
checkpoints,  then  sends  back  an  ACK  to  HA.  After  receiv¬ 
ing  all  the  ACKs,  the  HA  trims  its  itinerary  array.  The  HA 
can  also  start  the  garbage  collection  either  periodically  to 


Figure  4:  Recovery  algorithm. 


limit  the  length  of  the  itinerary  array  or  if  stable  storage  at 
the  MSS  becomes  depleted. 

4.3  Recovery  Algorithm 

A  mobile  host  restarts  a  failed  process  by  sending  to 
its  HA  a  message  containing  the  id  of  the  process  to  be 
restarted.  The  HA  responds  by  sending  to  the  MH  the 
message  tag  of  the  last  message  sent  out  by  the  process. 
The  HA  determines  which  MSS  currently  holds  the  lat¬ 
est  checkpoint  for  this  process  and  asks  the  MSS  to  send 
the  checkpoint  to  the  MH.  Then  the  HA  sends  requests  to 
MSSs  that  hold  message  logs  for  the  process,  which  then 
in  turn  replay  the  log  so  that  the  process  receives  messages 
in  the  same  order  as  before  failure.  When  replaying  the 
logged  messages.  The  MSSs  mark  them  as  “replayed”  so 
that  they  are  not  logged  by  the  receiving  MSS.  If  other  pro¬ 
cesses  continue  to  send  messages  to  the  recovering  process, 
these  messages  will  be  logged  and  sent  to  the  MH  as  nor¬ 
mal  messages,  but  they  are  not  delivered  to  the  application 
until  after  the  recovery  is  complete.  Figure  4  illustrates  this 
procedure. 

If  an  application  attempts  to  send  messages  during  re¬ 
covery,  the  message  tag  is  compared  to  the  tag  of  the  last 
message  sent  by  the  process  before  failure.  If  the  tag  in¬ 
dicates  that  the  message  has  been  sent  before  the  failure, 
the  message  is  not  transmitted  by  the  MH.  This  prevents 
the  MH  from  re-transmitting  application  messages  previ¬ 
ously  sent  during  recovery,  thereby  saving  bandwidth  and 
battery  power.  Failures  during  recovery  are  handled  in  the 
same  way  as  failures  during  normal  execution,  since  mes¬ 
sages  sent  to  the  MH  during  recovery  are  logged  on  the 
MSS  as  normal  messages. 


4.4  Limited  Stable  Storage  on  Mobile  Support 
Stations 

If  a  mobile  support  station  depletes  its  stable  storage 
while  trying  to  store  checkpoints  or  message  logs  on  behalf 
of  mobile  hosts,  it  has  to  either  halt  and  perform  garbage 
collection  or  find  alternative  storage.  Halting  an  MSS  ef¬ 
fectively  blocks  every  process  on  every  mobile  host  in  the 
MSS’s  cell.  All  incoming  packets  for  mobile  hosts  are  lost 
and  must  be  resent  later.  Managing  stable  storage  on  the 
MSSs  to  reduce  the  frequency  of  blocking  is  therefore  crit¬ 
ical.  Stable  storage  management  is  the  focus  of  another 
recent  project  [20]. 

One  way  to  reduce  the  possibility  of  halting  an  MSS  is 
to  use  watermarks.  When  free  stable  storage  on  an  MSS 
reaches  a  low  watermark,  that  MSS  selects  a  process  as  the 
target  of  garbage  collection,  forces  it  to  take  a  checkpoint 
(maybe  on  another  MSS),  and  discards  previous  message 
logs  and  checkpoints  saved  for  that  process. 

Halting  an  MSS  can  also  be  avoided  when  storage  is  de¬ 
pleted  by  forwarding  the  logs  and  checkpoints  to  the  mo¬ 
bile  host's  home  agent,  if  there  is  enough  bandwidth  in  the 
backbone  network.  The  MSS  can  then  execute  the  garbage 
collection  algorithm.  Another  alternative  is  to  have  other 
MSSs  or  routers  on  the  route  of  the  packets  store  message 
logs.  The  MSS  on  the  last  hop  can  simply  forward  packets 
to  the  MH  without  logging  them.  This  requires  some  sig¬ 
naling  messages  be  sent  to  the  HA  so  that  the  HA  knows 
the  exact  locations  of  the  logs. 

5  Experimental  Results 

We  compare  the  performance  of  our  protocol  with  an 
ideal  coordinated  checkpoint  protocol  that  takes  periodic 
checkpoints  without  exchanging  any  messages.  Failure 
free  overhead  and  recovery  time  are  evaluated. 

5.1  Experiment 

The  specific  environment  used  in  the  experiment  is 
shown  in  Figure  5.  A  Sun  Sparc  20  workstation  running 
Solaris  2.6  with  320M  memory  and  a  Lucent  Technology 
Wavepoint  II  [27]  connected  by  10M  Ethernet  served  as 
the  mobile  support  station.  Checkpoints  and  message  logs 
were  stored  on  a  dedicated  file  server  that  was  connected 
to  the  workstation  using  a  high  speed  ATM  network.  Two 
Pentium  II  300MHz  PCs  equipped  with  Lucent  Technol¬ 
ogy's  Wavelan  [27]  wireless  interface  cards  served  as  the 
mobile  hosts.  The  PCs  were  running  Windows  NT  4.0  with 
256M  memory  each.  In  our  implementation,  processes  ex¬ 
ecuted  pre-generated  traces  that  emulated  WWW  brows¬ 
ing  behavior.  The  processes  also  functioned  as  servers  and 
they  read  requests  and  generated  replies  in  both  sleep  and 
request  states.  There  were  four  client  processes  running, 
two  on  each  PC.  Each  client  was  assigned  a  unique  ID. 
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Table  1:  Execution  Time  Overhead  Comparison. 


Trace 

unmodified 

(seconds) 

ckpt 

(seconds) 

ckpt 

Overhead(%) 

log  with  ckpt 
(seconds) 

log  with  ckpt 
Overhead(%) 

T1 

1180.2 

1243.6 

5.3 

wmimm 

T2 

869.1 

9067.5 

4.3 

915.5 

5.34 

mm 

1013.4 

1060.1 

4.6 

5.20 

msM 

871.5 

898.1 

3.0 

3.92 

911.3 

5.0 

4.32 

T6 

866.7 

4.9 

913.1 

5.35 

T7 

850.7 

891.9 

4.8 

876.6 

3.04 

Figure  5:  Experiment  environment. 


For  routing  and  processing,  a  client  attached  to  each  mes¬ 
sage  it  transmitted  a  header  that  contained  the  destination 
ID,  the  type  of  message  (request  or  reply),  and  the  ID  of 
the  sender.  Clients  did  not  send  messages  directly  to  each 
other.  Instead,  they  sent  messages  over  the  wireless  net¬ 
work  to  a  server  process  running  on  the  Sun  workstation. 
The  server  process  was  responsible  for  storing  checkpoints 
and  message  logs  as  requested  by  the  client  processes.  It 
also  functioned  as  a  router  by  examining  the  header  field 
of  each  message  and  forwarding  messages  to  their  destina¬ 
tion. 

Checkpoints  were  not  actual  restartable  process  states, 
but  were  large  messages  intended  to  represent  a  range  of 
reasonable  state  sizes  for  mobile  hosts.  Client  processes 
had  a  separate  thread  that  periodically  sent  checkpoints  to 
a  server.  The  server  process  also  had  separate  threads  that 
listened  for  checkpoints  and  wrote  them  to  disk.  Check¬ 
pointing  was  asynchronous. 


5.2  Trace  Generation 

Request  traces  were  generated  from  four  individual 
traces  to  represent  a  variety  of  network  load.  These  four 
individual  traces  represented  the  process  sleep  interval 
(SLEEP),  number  of  requests  sent  during  each  active  in¬ 
terval  (NUMBER),  the  request  packet  distribution  (RQ), 
and  the  reply  packet  distribution  (RY).  The  request  packet 
distribution  was  small  as  HTTP  requests  are  typically  less 
than  several  hundred  bytes.  Reply  packets  were  large  with 
a  large  variance  to  reflect  the  nature  of  a  typical  web- 
servers’  output.  When  accessing  a  web  page,  several  re¬ 
quests  are  typically  sent  to  the  web  server.  We  captured 
this  behavior  with  the  NUMBER  trace. 

A  utility  program  read  the  first  three  traces  and  gener¬ 
ated  request  traces  used  by  each  client.  This  program  first 
read  a  value  from  the  SLEEP  trace  and  multiplied  it  by 
1000  and  a  predefined  coefficient  to  obtain  the  sleep  time  in 
milliseconds.  The  time  value  was  written  into  the  request 
trace  file.  The  NUMBER  trace  was  then  read  to  determine 
how  many  request  events  were  to  be  written  out  and  the 
destination  for  these  requests  was  randomly  chosen.  For 
every  request  event,  the  request  length  was  read  from  the 
RQ  trace.  Seven  traces  (T1 ...  T7)  were  generated,  with  co¬ 
efficient  of  1/2,  1/4,  ...  1/128.  By  using  several  ratios,  we 
obtained  a  variety  of  traces  that  represent  distinct  network 
loads. 

5.3  Failure  Free  Overhead 

The  applications  were  first  executed  unmodified  with¬ 
out  message  logging  and  checkpointing.  They  then  ran 
with  only  periodic  checkpointing  enabled.  Finally,  the  ap¬ 
plication  executed  with  both  checkpointing  and  message 
logging  enabled.  The  checkpointing  interval  was  five  min¬ 
utes.  The  execution  times  shown  in  the  following  figures 
and  tables  are  the  average  of  three  runs.  Total  execution 
time  of  the  three  cases  for  different  traces  was  measured 
and  is  shown  in  Table  1  and  Figure  6.  The  overheads  in- 
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Figure  6:  Normal  execution  overhead. 


Figure  7:  Recovery  time. 


curred  by  the  coordinated  checkpointing  protocol  and  the 
message  logging  protocol  are  also  shown. 

5.4  Recovery  Time 

Recovery  time  is  obtained  by  measuring  the  time  for  a 
process  to  read  the  checkpoint  and  proceed  to  a  specific  ex¬ 
ecution  point.  In  our  experiments,  we  chose  that  point  to 
be  after  the  500th  event  in  the  request  trace  file.  No  check¬ 
pointing  or  message  logging  takes  place  during  recovery. 
Recovery  times  for  the  consistent  checkpointing  protocol 
and  the  message  logging  protocol  are  measured  and  shown 
in  Table  2  and  Figure  7. 

5.5  Discussion 

From  the  failure  free  overhead  data  we  see  that  for 
all  of  the  traces  message  logging  has  similar  performance 
to  checkpointing  without  logging.  The  largest  difference 


Trace 

Using  ckpts 
(seconds) 

Using  message  logs 
(seconds) 

T1 

326.5 

131.9 

T2 

200.4 

19.82 

T3 

260.0 

70.28 

T4 

191.00 

23.16 

T5 

220.9 

44.14 

T6 

202.6 

27.67 

T7 

195.5 

18.98 

is  less  than  2  percent  of  the  execution  time  without  any 
checkpoints.  When  message  logging  is  performed,  the  pro¬ 
cesses  take  checkpoints  at  the  same  frequency  as  the  stan¬ 
dard  checkpointing  protocol.  The  experiments  illustrate 
that  the  overhead  due  to  logging  messages  at  the  MSS  is 
negligible.  The  reason  is  that  messages  are  not  written  to 
disk  before  being  forwarded  to  the  MH. 

As  is  known  for  fixed  networks,  recovery  using  message 
logging  is  often  faster  than  that  based  on  standard  check¬ 
points.  The  processes  recovered  three  to  ten  times  faster 
using  message  logs  than  with  checkpoints  in  our  experi¬ 
ments  with  the  Wavelan  wireless  network.  Processes  did 
not  have  to  block  and  wait  for  other  processes  to  transmit 
messages.  Messages  were  also  transmitted  over  the  wire¬ 
less  link  just  once  instead  of  twice,  as  in  normal  execution, 
resulting  in  less  contention  on  the  wireless  network  and 
lower  latency  for  message  transmission. 

An  interesting  phenomenon  is  that  for  some  traces  the 
overhead  due  to  message  logging  and  checkpointing  is  ac¬ 
tually  less  than  that  due  to  checkpointing.  One  explanation 
is  that  the  action  of  logging  message  changed  the  timing  of 
messages  transmitted  on  the  wireless  network  and  thereby 
contentions  were  reduced. 

6  Conclusions 

This  paper  described  a  message  logging  protocol  for 
mobile  hosts,  mobile  support  stations  and  home  agents  in 
a  Mobile  IP  environment.  An  approach  to  organizing  the 
distributed  state  information  was  also  presented.  The  or¬ 
ganizing  scheme  provides  easy  garbage  collection  without 
participation  from  mobile  hosts  and  can  tolerate  the  case 
where  some  mobile  support  stations  do  not  have  enough 
stable  storage  for  mobile  hosts  to  store  state  information. 

Failure  free  overhead  and  recovery  speed  were  com¬ 
pared  between  an  ideal  consistent  checkpoint  protocol  and 
our  message  logging  protocol.  Message  logging  incurred 
only  marginally  larger  overhead  during  failure  free  opera¬ 
tion  compared  to  the  ideal  consistent  checkpointing  proto¬ 
col.  Message  logging  has  a  decided  advantage  in  recovery 
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with  mobile  wireless  networks. 
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